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! Abstract 

CN ■ In this paper, under mild assumptions, we derive a law of large numbers, a central 

limit theorem with an error estimate, an almost sure invariance principle and a variant 
of Chernoff bound in finite-state hidden Markov models. These limit theorems are of 
interest in certain ares in statistics and information theory. Particularly, we apply the 
tyj ■ limit theorems to derive the rate of convergence of the maximum likelihood estimator 

, ^, 1 in finite-state hidden Markov models. 

^^ : 1 Main Results and Related Work 

2 ■ Consider a discrete memoryless channel with a finite input alphabet y and a finite output 
• ! alphabet Z. Assume that, at each time slot, the channel is characterized by the channel 
^ I transition probability matrix 11 = {p{z\y)). Let Y = (Yi : i G 7^) he the input process over 

y, which is a stationary Markov chain with transition probability matrix A. Let Z denote 
the output process over Z, which is often referred to as a hidden Markov chain. Assume 
that A is analytically parameterized hy 9 G fl, where fl is an open, bounded and connected 
X ; subset of M™. 

I Assume that the true parameter of A is which is often assumed unknown in a statistical 

context. For any / e NU {0}, we are interested in the limiting probabilistic behavior of the 
l-th derivative of logp^(Z") with respect to any 6 E Q, denoted by logp^(Z"); here 
is used to denote the sequence of random variables {Zi, Z2, ■ ■ ■ , Zn), and similar notational 
convention will be followed in the sequel. We will prove limit theorems for appropriately 
normalized versions of Dglogp^{Z^), for any fixed / and any 6 E Q. Here, we remark that, 
only for notational convenience, we are treating ^ as a one-dimensional variable throughout 
this paper. 

Consider the following two conditions: 

(I) n is a strictly positive matrix, and for any 6 E Q, is irreducible and aperiodic; 



(II) for any 9 e Q, a^'\e) ^ lim^^oo V V))V^ > 0, where a^n\e) = ^YweM \ogpO{Z^)) 
(the existence of this limit under Condition (I) will be established later). 



And we define 

n— ^-oo 

when the hmit exists. 

The following theorem is an analog of the law of large numbers (LLL). 

Theorem 1.1. Assume Condition (I). Then, L'^^\6) is well-defined, and for any 6 eVL, 

D',\ogp\Z^) _^ ^^^^ prohahtUty 1. 

n 

For the case / = 0, Theorem 11.11 has already been observed in [2], where the convergence 
is used to prove the consistency of the maximum likelihood estimator (MLE) in a hidden 
Markov model. Note that when 9 = Oq, we have L^^\Z) = —H^°(Z), where H^°{Z) denotes 
the entropy rate of the hidden Markov chain Z at the true parameter ^o- So, Theorem 11.11 
is a (rather) special case of the celebrated Shannon-McMillan-Breiman theorem, which only 
assumes the stationarity and ergodicity of Z. Entropy rate of a hidden Markov chain is of 
great importance in many areas in mathematics and physics; in particular, the computation 
of H^°{Z) is a first step to compute the capacity of a finite-state channel in information 
theory. Unfortunately, it is notoriously difficult to compute such a fundamental quantity 
(see [151 |2Z] and references therein). Recently, based on the Shannon-McMillan-Breiman 
theorem, efficient Monte Carlo methods for approximating H^°{Z) were proposed indepen- 
dently by Arnold and Loeliger JT\, Pfister, Soriaga and Siegel [5U], Sharma and Singh 

We will prove the following central limit theorem (CLT) for Dglogp^(Z") with an error 
estimate, which is often referred to as Berry-Esseen bound [H [16] in probability theory. 
Here, we remark that, in this paper, to avoid notational cumbersomeness, while ensuring its 
dependence on various variables, we often use C to denote a constant, which may not be the 
same on each appearance. 

Theorem 1.2. Assume Conditions (I) and (II) and consider any given compact subset 
Qq G Q. For any e > 0, there exists C > such that for any n and any 9 gVLq, 



sup 

X 



where G{x) = J^^{27i)-^^^ exp{-yy2)dy. 



For the case / = 1, Theorem 11.21 (without the Berry-Esseen bound) has first been shown 
in |2], which, together with Theorem 11.11 for the case / = 2, can be further used to derive 
the asymptotic normality of the maximum likelihood estimator (MLE) for a hidden Markov 
model. This asymptotic normality result is of great importance to the statistical estimation 
aspects in hidden Markov models, and has been generalized extensively in [HI El [THl [HI [25l 
[281 [Ml [37]. 

Theorem 11.21 for the case / = and 6 = 6q (again without the Berry-Esseen bound) has 
been considered in more probabilistic settings as well: a CLT for logp^(Z") assuming Z is 
a Markov chain is first proven in [42j; this result is further generalized to obtain a refine- 
ment of the Shannon-McMillan-Breiman theorem in [23] under some mixing assumptions; 
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under somewhat similar conditions, an almost sure invariance principle, a deep result which, 
among many other applications, implies a CLT, has been established in [33]; the almost sure 
invariance principle is used to study the asymptotic behavior of the so-called recurrence and 
waiting times in |2^, where a CLT for logp^(Z") is embedded in the main results. 

In a more information theoretical context, a CLT [31] for logp^(Z") is derived as a 
corollary of a CLT for the top Lyapunov exponent of a product of random matrices; a 
functional CLT is also established in [2^. In essence, both of these two CLTs are proved 
using effective Martingale approximations of logp^(Z") (see [17] for this standard technique). 

There is also a large body of work (see [211 EO] and references therein) on variants of the 
CLT for the empirical entropy of some ergodic mappings in the language of ergodic theory, 
among which, of great relevance to this work are [2T| [20] , where CLTs with Berry-Esseen 
bounds are derived. Here, we remark that there are minor mistakes in the proof of the main 
results in [21]; it appears that a modified proof, together with stronger assumptions, can 
only yield weaker results than claimed in [21j. 

Note that the error estimate in the CLTs is of great significance in many scenarios, such 
as characterizing the speed of convergence of the above mentioned Monte Carlo simulation 
in [H |30l |39] and deriving non- asymptotic coding theorems information theory [H] and so on. 
Among all the previously mentioned related work, only [21] [20] give error estimates for the 
CLTs. Compared to these two work, where only some mixing conditions are assumed for Z, 
our assumptions are rather strong. On the other hand, our CLT is considerably stronger in 
the sense that it is essentially for a class of functions including log|}^(Z") and its derivatives 
with tighter error estimate. 

Following Phillip and Stout [33], we prove the following almost sure invariance principle. 

Theorem 1.3. Assume Conditions (I) and (II). Define a continuous parameter process 
{S(t),t > 0} by setting 

S{t) = Y,D'e\ogp\Z-)-nL^'\d). 

n<t 

Then, for any given 9 eVL, without changing the distribution of {S{t),t > 0}, we can redefine 
the process {S{t),t > 0} on a richer probability space together with the standard Brownian 
motion {B[t),t > 0} such that for any e > 0, 

S{t) - B{{a^'\e))h) = 0{t^/^+') a.s. as t ^ oo. 

As elaborated in [33] , an almost sure invariance principle is a fundamental theorem with 
many applications, which include, besides a CLT and some large deviation results, a law of 
iterated logarithm (LIL). The following LIL immediately follows from Theorem 11.31 

Theorem 1.4. Assume Conditions (I) and (II). For any given 6 & Q, we have 

D',\ogp%Z^)-nL^'\9) _ 
TiJ^P (2n(a(0(^^))2 1oglogn(a(0(e))2)i/2 

Theorem 11.41 is not completely new: the almost sure invariance principle in [33], which is 
established under much weaker conditions, implies Theorem 11.41 for the case / = 0. In [29] , 
it has been shown that with reasonable assumptions, a CLT with a sharp enough error esti- 
mation term implies an LIL for i.i.d. sequences of random variables. For possibly dependent 
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sequences of random variables, Petrov's result may not be directly applied to derive an LIL, 
however the spirit of the proof can be cautiously followed to establish Theorem 11.41 as an 
alternative approach (see [34] )■ Using this idea, a law of iterated logarithm (again for the 
case I = 0) has also been noted in [2T1 [20] under some mixing assumptions. 

We also prove the following variant of the Chernoff bound (see [H]), giving a sub- 
exponentially decaying upper bound for the tail probability of Sn- 

Theorem 1.5. Assume Conditions (I) and (II) and consider any given compact subset 
flo C Q. For any x > and any < e < 1, there exist C>0, 0<7<1 such that for any 
n and any 9 G Qq, 

p ( ^l\ogp\Z^)-nL('){e) ^^^^ ^^„.-._ 

Let On G f2 be the n-th order maximum likelihood estimator (MLE) for the considered 
hidden Markov model, that is, 

On = argmaxggf^log/(Z{'). 

The consistency of the MLE in hidden Markov models have been extensively discussed in 
statistical contexts (see representative work in [21 [251 [6]). As one of the principal applications 
of the limit theorems above, assuming the consistency of the MLE, the following theorem 
further gives the rate of convergence of the estimators 6'„ to the true parameter 6o- 

Theorem 1.6. Assume Conditions (I) and (II). Assume that there is a compact subset 
VLq (Z Vt such that VLq contains Oq and L^'^\9) is non- singular for any 6 eVLq. Then, on the 
event that '12o contains all On" and "On converges to 6q", for any x,e > 0, there exists C > 
such that 

Pi\0n-Oo\ >x)< Cn-^l^^\ 

2 Limit Theorems under Exponential Mixing and For- 
getting Conditions 

A stationary stochastic process T = is said to be -^-mixing if 

i}{n)= sup |P(\/|[/) - P(\/)|/P(y) ^ as n ^ cx), 

f/eB(T-"),yeB(To°°),p([/)>o,P(y)>o 

where B(T-) denotes the cx-field generated by {Tk : k = i,i + 1, ■ ■ ■ Let Z = {Zn)n& be 
a stationary T/^-mixing sequence of random variables over a finite alphabet Z satisfying the 
following property: 

(a) [exponential mixing] There exist C>0, 0<A<1 such that 

^{n) < CA" 

for all n. 
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Let Z* be the set of all finite words over and let / : 2* — ^ M be a function satisfying the 
following properties: 

(b) There exist C, C" > such that for all z\ G Z\ 

C < f{zo\z;;\) < C". 

(c) [exponential forgetting] There exist C>0, 0<p<l such that for any two hidden 
Markov sequences z^^, zP_^ with = (here m, m > n > 0), we have 

\f{z,\z-_l)-f{zo\zZl:)\<Cp\ 

Define 

X, = f{Z,\Z\-') - E[f\Z.^Z\-% 

and 

n 

S'„ = ^Xi, al = Var{Sn). 

i=l 

We will also consider 

(d) cr = lim„_j.oo a/c^/w > (the existence of this limit under Conditions (a), (b) and (c) 
will be established in Lemma [3.31 and Remark 13 ■4p . 

We will prove the following theorems under Conditions (a), (b), (c) and (d). Not only can 
these theorems be used to prove the main results in Section [H but also they are of interest 
in their own right. The first theorem is a law of large numbers. 

Theorem 2.1. Assume Conditions (h) and (c). With probability 1, 

y as n — )■ oo. 

n 

We will also prove the following central limit theorem with a Berry-Esseen bound. 

Theorem 2.2. Assume Conditions (a), (b), (c) and (d). For any e > Q, there exists C > 
such that for any n 

sup |P(5„/a„ <x)- Gix)\ < 

X 

where G{x) = /_!'^(27r)-i/2 exp(-?/V2)c/?/. 

The following theorem is an almost sure invariance principle. 

Theorem 2.3. Assume Conditions (a), (b), (c) and (d). Define a continuous parameter 
process {S(t),t > 0} by setting 

S{t) = Y,Sn- 

n<t 

Then, for any given 9 eVL, without changing the distribution of {S(t),t > 0}, we can redefine 
the process {S(t),t > 0} on a richer probability space together we with the standard Brownian 
motion {B{t),t > 0} such that for any e > 

S{t) - B{aH) = 0(t^/=^+=) a.s. as t oo. 
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As one of many applications of Theorem 12. 3[ the following law of iterated logarithm imme- 
diately follows. 

Theorem 2.4. Assume Conditions (a), (h), (c) and (d). Then, we have 

Sn 

^^^(^^ (2n(T2 1oglogna2)V2 

We also prove the following variant of the Chernoff bound (see [llj), giving a sub-exponentially 
decaying upper bound for the tail probability of Sn- 

Theorem 2.5. Assume Conditions (a), (h) and (c). For any x > and any < e < 1, 
there exist C>0, 0<7<1 such that for any n, 

P{Sn/n >x)< Cr"\ 



3 Proofs of the Theorems in Section [2 
3.1 Key Lemmas 

From now on, we rewrite f{zj\zi~'^) — E[f[Zj\Zj~^)] as gizj) for notational simplicity. 

The following lemma shows that for a fixed j > 0, E[XiXi^j] exponentially converges as 
i ^ CO, and for any i < j, E[XiXj] exponentially decays in j — i. 

Lemma 3.1. Assume that Conditions (a), (b) and (c). 

1. There exist C>0, 0<p<l (here p is as in Condition (c)) such that for all i,j > 0, 

\E[Xi^iXi^i^j] — E[XiXi^j] \ < Cp\ 

2. There exist C>0,0<6<1 such that for any positive i < j , 

\E[XiXj]\ < ce^^\ 

Proof. 1. Simple computations lead to 

E[X,+,X,^,^,]-E[X,X,^,] = 



i+l+j i+j 
^1 ^1 



Y P(^-i-j)9iz^i_j){gizZtj) - gizZtj+i)) 



2° 



By Condition (b), f{zo\zZl) and E[f{ZQ\ZZi)] are all bounded from above and below uni- 
formly in i. It then follows from this fact and Condition (c) that there exist C>0, 0<p<l 
such that 

\E[Xi^iXi+i+j] — E[XiXi+j]\ < Cp\ 

Part 1 of the lemma then immediately follows. 

2. Let / = + jJ/2. By Conditions (a) and (c), there exist < p, A < 1 such that 

E[X,X,] = J2p(''^)9{zl)9{zi) 

^1 

= J2p(^i)9iz{)p{zi\zl)g{zi) + 0{pi-') 

= J2p{zl)g{zl){p{zi) + 0{\^-^)p{zl))g{zl) + 0{pi-^) 

= J2p(^i)9iz\)pizi)gizi) + Y.piz\)giz\)OiX^'^)pizi)gizi) + 0(p^-') 

z{,zf zi,zf 

= + O(A'-*) + 0(/>'-'). 

Notice that the constants in 0(A'^*), 0(/>'^') above do not depend on zl- Part 2 then 
immediately follows . □ 



Remark 3.2. By Part 1 of Lemma I3.H for any fixed j, the sequence E[XiXi^j], i = 
1,2,---, is a Cauchy sequence that exponentially converges. For any fixed j, let aj = 
limi^oo E[XiXi+j]. Then by Part 2, \aj\ exponentially decays as j — t- oo; consequently, we 
deduce (for later use) that + 2 YlJLi '^j converges. 

Lemma 3.3. Assume Conditions (a), (h) and (c). For any Q < Eq < 1, there exists C > 
such that for any m and n, 



n 



- (ao + 2^, 



< Cn 



here, recall that, as defined in Remark \3.^ aj = \imi_^ao E[XiXi^j] 
Proof. Letting /3 = n~'^° for a fixed < sq < 1, we then have 



m+l<i, i+j<.n+m\/L^j- 



(Z],=0 +2 YjQ<j<Hn +2 Y.j>Rn)E[XiX. 



n 



n 



n 



By Part 1 of Lemma [3.11 and Remark 13.21 for any j > 0, E[XiXi^j] — aj = 0{p^) for some 
< p < 1. It then follows that for < j < (3n, 

J2 E[X,X,+,] = {n-j)a, + 0{l)- 

m+l<i, i+j<n+m 
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here the constant in 0(1) does not depend on j. Also, by Part 2 of Lemma l3.ll and Re- 
mark [3l2l there exists < < 1 such that for all j > /3n, E[XiXi^j] = 0(^^"), and thus 
Qj = 0{6^^). Continuing the computation, we have 

E[{Sn+rn - Srnf] _ {ua^ + 0(1)) + {2{n - l)a^ + 0(1)) + ■ ■ ■ + {2{n - /3n)a^„ + 0(1)) ^ 0(n^g/^") 
n n n 

= ao + 2a, + . . . + 2a,„ - 2 ^^ + + " " " + + /30(1) + 0(n^^"). 

n 

The lemma then immediately follows if we let n go to infinity. □ 

Remark 3.4. Choosing m in Lemma [3^ to be 0, we deduce that lim„_i.oo cr^/?^ exists and 
is equal to = + 2 YlJLi '^j- 

Lemma 3.5. For any I G N, there exists C > such that for all m and n, 

E[{Sn+^ - Smf] < CuK 

Proof. By Condition (c) and the stationarity of Z, we observe that for any m, n, 

n+m n+m 

\2h 



i=m+l i=m+l 

= E[S^] + 0{E[\Snf'^]) + 0{E[S^t^]) + ■■■ + 0(1). (2) 



Notice that for any j, 

< E[s^^Y/^E\s^^-^Y'\ 

So, in order to prove the lemma, it suffices to prove that for any / G N, there exists Oi > 
such that 

E[S^] = E[(Xi + X2 + ■ ■ ■ + Xrf'] < CmK 

Now, for any / G N, consider the term X-^^Xj^ ■ ■ '-^j^, where 1 < ii < 12 < ■ ■ ■ < ik ^ n and 
/j's are all strictly positive satisfying li + I2 + ■ ■ ■ + Ik < 21. Let v = v{ii,i2, ■ ■ ■ ,ik) be the 
smallest index such that for all j = 1, 2, . . . , /c — 1, 

iy+i ~ iy ^ ~ ij- (3) 
Now, for any v + 1 < u < k, recalling that 

X,^ = fiZjZl'^~')-E[fiZ,jZl--% 

we define 

Xi = f(Zi iZ'^'f-}. ,,^) - E\f(Zi IZ'.^-^. 
Applying Condition (c), we have for some < p < 1 
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We then have the following decomposition: 

= ■ ■ X^X^+J ■ ■ ■ + r(^) [Xjxg ■ ■ ■ X^^] 

= E[xs ■ ■ ■ x'z]E[xtxi ■ ■ ■ ] + ^^'^ [^s^s ■ ■ ■ 

+ r(2)[X^---X^X'"+^---X'M 
= E[X^X'2 ■ ■ ■ X'"1E[X'"+'X'"+' ■ ■ ■ X'M + r[X^X'^ ■ ■ ■ ^^1, 

where r^^^ [Xj^X-^ ■ ■ ■ X-^], r^^^ [XjjXj^ ■ ■ ■ Xj*] are some intermediate terms produced during 
the decomposition and r[Xj'^X-^ ■ ■ ■ X'^] is the residual term resulted from the decomposition. 
Using ([3]) and Conditions (a), (b), (c), we can verify that for some < 6* < 1 

n-l 

r\X^X^ . ■ -Xj] = Y,Y.Oi0'm - m''-') = 0{n). (4) 

Note that the above decomposition can be recursively applied to £'[X-^^Xj'^ ■ ■ ■ X^'^] and 
E[xll+lXll+l ■ ■ -Xj]. It then follows that ^[Xjxg ■ ■ -Xj] can be decomposed into a sum 
of at most 2^' terms, each of which taking the following form 

E[4]E[x|]...i?[4;]r,.r,....r,.^, 

where each I'j > 2, l[ + I2 + ■ ■ ■ + I'l^^ + 2k2 < 21 and r,., r,., ■ ■ ■ , r^* are the residual terms 
resulted from the recursive decomposition. Then, similarly as in deriving (jlj), one checks 
that E[S^^] can be written as a sum of at most 2^' terms, each of which is upper bounded by 

{J2 E[\X,/^]E[\X,/^] . . . E[\X,^^ 1'^=!]) pH---O(n) , 

where 

I'j > 2, l'i + l2 + --- + l'k,+ 2A;2 < 2/, (5) 
and the summation is over all possible x'/x',^ ■ ■ ■ X f^ satisfying ([5]), which can be estimated 

12 

by 

J2 E[\X,/^]E[\X,f^] . . . E[\X,^^ f^.] = 0(n'-^^). 

It then follows that 

E[Sl'] = 0{n'-''^)0{n''^) = 0{n'). 
We then have established the lemma. □ 
Lemma 3.6. For any I G N, there exists C > such that for all m and n, 

E[\Sn+^-SX'']<Cn'-"\ 
Proof. The lemma immediately follows from Lemma [3.51 and the fact that for any m,n, 

E\\Sn+m ~ Srn^^ ^\ ^ E\{^Sn+m ~ Smf'^^^'^ E\{Sn+m ~ Smf'^ "^^^"^ ■ 

□ 
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3.2 Proof of Theorem [231 

It follows from Condition (c) that there exists < p < 1 such that for any j < i, 

\f{z,\z}-')-f{z^z;z^\ = o{f/-n, 

which implies that f{Zi\Z^_r^) = limj_>„oo /(^t|^j~^) exists, and 

l/(z,|zj"i)-/(z,|zr^)l = o(p-^-), 

and furthermore 

\E[f{Z,\ZY')] - E[f{Z,\Z^sm = 0{p'~'). 

We then have 

EtiX^ _ y f{z.\zr') - E[f{z,\z\-')] _ " /(z.izri) - E[f{z,\zi~^)] + o{p^) 

1=1 1=1 

Here, we remark that the constants in all the above 0-terms are independent of i, j. Note that 
the sequence f{Zi\Z''_r^) — E[f{Zi\Z'^S^)] is stationary and ergodic. Applying the Birkhoff 
ergodic theorem, and using the fact that Yll^=iP^/^ — )■ as n — )■ oo, we then establish the 
theorem. 



3.3 Proof of Theorem [2721 

For any fixed 0</3<a<l, we consecutively partition the partial sum Sn into blocks 
^ij Ci) ''?2, C2, • • • such that each rji is of length p = p{n) — and each Q is of length q = 
q{n) = n^. In other words, for any feasible i, 

and 

Then, Sn can be rewritten as a sum of r]- "blocks" and (- "blocks" 

k k 

Sn = S!^ + S,^ := rji + (i, 

i=l i=l 

where k = k{n) = n/{n°' + The above so called Bernstein blocking method 0] is a stan- 
dard technique for proving limit theorems for a variety of mixing sequences. Roughly speak- 
ing, the partial sum Sn is partitioned into "short blocks" rii,ri2,--- ,rik and "long blocks" 
Ci;C2!''" Xk- Under certain mixing conditions, all long blocks are "weakly dependent" on 
each other, while all short blocks are "negligible" in some sense. 

Now, we will "truncate" ^j's to obtain ^j's. In more detail, recall that for any j with 
iq + {i — l)p + 1 < j < iq + ip, we have 

X, = f{Z,\Zr)-E[f{Z,\Zi-% 
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we then define 

= f {Zj\Z{i-l)p+{i-l)q+lq/2\+l) ~ -^[/(^il^(i-l)p+(i-l)g+Lg/2j+l)]- 

Applying Condition (c), we derive that 

X,-X, = 0(p^(")/^). (6) 

We then define, 

— -^iq+{i-l)p+l + ■ ■ ■ + Xiq^ip, 

and 

Sn = J2^- a; = V^VaK^. 

i=i 

With lemmas in Section 13.11 established, the remainder of the proof of Theorem 12.21 
becomes more or less standard, which can be roughly outlined as follows: 

1. We first show E[exp{itS'^/ an)] and fl*^^-^ £'[exp(it(fj/o"„) are "close" (see Lemma [3^ . 

2. Then by the standard Esseen's Lemma, we show P{S'^/an < x) and G{x) are "close" 
(see Lemma I3.10p . 

3. Finally, since S* are "negligible", we conclude, in the proof of Theorem 12. 2[ that 
P{Sn/(Tn < x) and P{S'n/(Tn < x) are "close", and thus P{Sn/crn < x) and G{x) are 
"close". 

Before proceeding, we first remind the reader the classical Esseen's inequality (see, e.g.. 
Lemma 5.1 on Page 147 of [32] ) . 

Lemma 3.7 (Esseen's Inequality). Let (1,(2, ■ ■ ■ ,Cn be independent random variables with 
E[Q] = 0, E[\Q\^] < 00, J = 1, 2, ■ ■ ■ , n. Let 

n n 
j=l j=l 

and let Fn{x),(j)p^{t) be the distribution, characteristic functions of the random variable 
Y^^^^Cj/ an, respectively. Then 

|0^„(t)-e-*V2|<16L„|t|V*^/3 (7) 

for \t\<l/{4Ln). 

The following lemma is a version of Esseen's lemma, which gives an upper bound on 
the difference between two distribution functions using the difference between the two cor- 
responding characteristic functions. We refer to page 314 of [40] for a standard proof. 
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Lemma 3.8 (Esseen's Lemma). Let F{x) and G{x) be distribution functions with charac- 
teristic functions (prit) and(j)G{t), respectively. Suppose that the distributions corresponding 
to F{x) and G{x) each has mean 0, and G{x) is differentiable and for any x, \G'{x)\ < M 
for some M > 0. Then 



1 r 

sup \F{x) - G{x)\ < - 



t 



, 24M 

dt H 

vrT 



TC J _rp 

for every T > 0. 

We will need the following lemma. 
Lemma 3.9. There exist C > 0, < pi < 1 such that for all n and \t\ < n^l'^ , 

k 

|E[exp(zt^:/a;)] -\{E[eM^tQla:)]\ < C pt\ 

j=i 

Proof. Let I = {k - l)p + {k - l)q + [q/2\ + L By Condition (a), there exists < A < 1 
such that 

k fe-l 

i?[exp(zt = E[e^{ltY,QI^'n)^M^Kk/cJ'^)] 

j=i i=i 

fc— 1 kq+kp 

= E[exp(^t5^0/^;)exp(zt 9{z\)/a'^)] 

i=l i=kq+{k-l)p+l 
fc — 1 kq+kp 

= i?[exp(^t 0/^n)]i^[exp(^t Y 9{4)l^'n)] + 0(A^(")/2) 

i=l i=kq+{k-l)p+l 
k-l 

= i?[exp(^t Y 0/^n)]^[exp(^^d/^;)] + 0(A^(")/2), 
i=i 

where, again, f {zj\zj^^)—E[f {Zj\Zj^^)] is rewritten as (7 (z^). Noticing that \E[exp{it(j / al^)]] < 
1 and applying an inductive argument, we conclude that 

k k 

E[exp{ttS'Ja'J] = E[exp{ttYQm] =l[E[exp{ttQ/a'J]\ + 0{kX'^'^-y'), 

j=i j=i 

which immediately implies the lemma. □ 

Now, applying Lemma 13.81 we can derive the following lemma. 
Lemma 3.10. There exists C > such that for all n 



sup 

X 



PiS'JK <x)- G{x] 
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Proof. Note that all (/s have the same distribution. So, Lemma [3.91 in fact implies that 

\E[exp{ttS'J^'J] - {E[eM^tL/K)])'\ = 0{pf% (8) 

for some < pi < 1. Consider a sequence of i.i.d. random variables j = 1,2, ■ ■ ■ , k, each 
of which is distributed according to Ci- It then follows from ([8]) that 

\E[eM^tS'Ja'J] - iE[expittC,/a'^)]f\ = 0(pf )). (9) 

Now, let 

k 

It follows from Condition (a) that for some < A < 1 

Ky-crl = 0{eE'[MX''^-^), 

which implies that 

{E[exp{ttCi/a'J]f - (E[exp(ztCi/^n)])' = Oipf^), (11) 
for some < p2 < 1- Therefore, combining ([9]) and ( fTTl) . we deduce that 

\E[expittS'Ja'J] - (E[exp(ztCi/a„)])1 = 0(pf )), (12) 

for some < ps < 1. So, in the sense of ( fT2l) . we can approximate S'^/a'^ using the sum of 
i.i.d random variables Cj/^n, j = 1, 2, ■ ■ ■ , /c, each of which is distributed according to (i/an- 
Applying Lemma \377\ to the i.i.d. sequence Ci/^n, we deduce that for |t| < 1/(4L„), 

|(E[exp(<i/a„)])' - e-*'/^| < 16L^\t\'e-''/' (13) 

where 



Note that, by (fTOj) and Lemma [3.31 we have 

Furthermore, by ([6]) and Lemma [3. 6 [ we have 

kE[\Ci\'^] = kO{p{nf'^) = 0(ni+"/2^. 
It then follows that there exists Ci > such that for all n. 
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From now on, let (f)p^{t),(f)p,{t) be the characteristic functions of the random variable 



^j=iCj /(^n, S'^/a'^, respectively. Then, by Lemma [3T8| we have 



snp\P{S'JK<^)-Gix)\< 



TC 



dt + 



2AM 



for every T > 0. It then follows that for any T > 



1 



snp\PiS'Ja^<x)-G{x)\<- 



T 



1 



dt + - 



T 



dt + 



2AM 



< 



71" J|t|<n-i/2 



t 



dt+- 



vr 



-i/2<|t|<T 



t 



dt+- 



TC 



dt- 



2AM 
' ttT ■ 



Note that there exists C2 > such that for all t, 

Now, setting T = 1 / {ACin-^/^+°'/'^) and applying 1^, and 1^, we then have 



(15) 



20 



sup \PiSja'^ < x)-Gix)\ < 



2-a 



lognp3 ^ log(4Ci)p3 '-\ / t e ' dt-\ n 

TT 7r TT 



which immediately implies the lemma. 



□ 



We are now ready to prove Theorem 12. 2[ The key point is P(S'„/cr„ < x) is close to 
P{S'Ja'^ < X). 



Proof of Theorem \2.^ Applying Conditions (a), (c) and Lemma [373| we deduce that for any 
small eq > 0, 

and 



a. 



/ \2 



i=l 

k 

J2e[C-]+2Y,E[QQ] 

i=l i<j 

kE[c'] + 2j2E[QQ = A;E[(Ci + 0(nV^")/2))2] +2 J]£;[C4] 



i<j 



i<j 



n „ n 
cr + 



n' 
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It then follows that 



^2 I -/ \2 



0{n^-"+^). 

Next, applying Condition (b) and Lemma [3.31 we have, through simple computations, that 



S' = i ^ 



n 



and 

We then observe that 



0{n) 



nP + 



-n 



0{n 



l+l3-a\ 



n 



1/2 



n 



a/2 



e{n^'^). 



= {Sn - + S'Sllon - IR) 



0{n 



{Sn Sj^)/(Jn + *S'„ 

{Sn - S'J/an + '^nQ (^1/2) 0(^1/2) 0(^1/2) 

{Sn-S'J/an + S'Mn'-''-'/'). 



For some r < 0, let Ai denote the event that 



c _ qi 

'-'n '~'n 



and let Ao denote the event that 



S„ 



a'an(a'+ an) 



Then, by the Markov inequality, we have, for any / G N 

P(Ai) = P(|5„-S;|>n-+i/2)< 
Note that there exist < 6*1, 6*2 < 1 such that 

E\\Sn - s'S'\ = E[\Sn - + oief^^^'))n 



E[\Sn - S'J"^^] 
^(r+l/2)2« 



j2 o(i5;[br^]E[|r/2ii---i5[kri)+o(^^: 

h+l2 + - + lk='il 



g(n)/2x 
2 ) 
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Then, by Lemmas 13.51 and I3.6[ we obtain, through some further computations, that 

E[\S^-ST] = 0{{kn^)'). 
Now, applying Lemma [3.51 13.61 and Conditions (a) and (c), one can verify that 



^'(■4.) = ^P^ = 0(n""-"-^"). (16) 



Again, by the Markov inequahty, we have, for any /, 

E[ST] 



P{A2) = P{\S'J > n^+i/2-/3+«) < 



Similarly, applying Lemma [3.5[ 13.61 and Conditions (a), (c), one can verify that 

= o = (17) 

Apparently, 

PiSJan <X)= P{S,n/(Jn < X, H A^) + P(5„/(T„ < X, U A^) , 

and 

P{SJa^ <x) = P{S'Ja'^ <x + S'Ja'^ - SJa^) 



It then follows from ( |T6ll and ( ITTll that for any x > —a/2, there exists /3 > sufficiently 
small and I G N sufficiently large such that 

P{Sn/a^ <x,A,U A2) < P{A,) + P{A2) = 0{n-'/'), (18) 

and 

P{SJan < X, Al n A^) > P{S'Ja'^ < x - Cm^ A\ n A^) 

> PiS'Ja'^ <x- + PiAl n A^) - 1 

= P{S'Ja'^ <x- Cin^) - C2n-^/\ 

for some Ci,C2 > 0. On the other hand, it is easy to check that there exists C3 > such 
that 

P{Sn/cTn <x,Aln A',) < P{S'Ja[, <x + C^rf). 

Noticing that 

\P{Sn/an < < max{P(^;,/<7; < x+C3n'')~G{x),G{x)-P{S'ja'^ < x-Cin^)+C2n- 
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and applying Lemma [3.10[ we derive 



\P{S'J^'n <x + CgnO - G{x)\ < \P{S'Ja'^ <x + C^n^) - G{x + C^n^)\ 
+ \G{x + C^rf) - G{x)\ = 0(n-^/2+a/2^ ^ q^^t^^ 

and similarly, 

\G{x) - P{S'Ja'^ <x- Cin^) + Ca^-i/^l = 0(n-i/2+a/2) ^ q^^t^ ^ 0{n-^/^). 

Setting a = 1/2, r slightly larger than —1/4, and choosing (3 > sufficiently small, we then 
have established the theorem. □ 

Remark 3.11. If Condition (II) fails, i.e., lim„_>oo o'n/'^ ~ 0' then a CLT of degenerated 
form holds for {Xi,i G N); more precisely, the distribution of (Xi + X2 + ■ ■ ■ + X^j 
converges to that of a centered normal distribution with variance 0, i.e., a point mass at 0, 
as n — 7- 00. This is can be readily checked since for any e > 0, by the Markov inequality, we 
have 

P(|(Xi + X2 + --- + X,)|/v^>5|) < a2/(ne2) ^ as n ^ 00. 
3.4 Proof of Theorem [2731 

Consider the following Bernstein blocking method with variable block lengths: we consec- 
utively partition the partial sum Sn into blocks //i, Ci? ^725 C25 • • • such that r/j is of length 
qj = qj{n) = and Q is of length pj = Pj{n) = j". Similarly as in the proof of Theo- 
rem [2]2l we have 

where is the sum of all feasible //-blocks and 5*^ is the sum of all feasible C-blocks. Let Ci 
denote the a-algebra generated by all Xj^s contained in Q. It is well known that S'^ can be 
approximated using a Martinagle in the following manner 

= ^1 + ^1^ 

where 

00 

6 = ^(-^[Ci+fclA] - E[C,i+k\C-i-i]) 
is a Martingale difference sequence, and 

00 

= -5'[Ci+fciA-i]- 

k=0 

Similarly as in the proof of Theorem \2.2\ we truncate ^-blocks in the following way: 
Consider a C-block taking the following form 

Ci = ^ji + ^ii+1 H \- 
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For any j = jiji + 1, ■ ■ ■ ,72, define 



and further 

Ci = ^ji + ^ ^j2- 

Before proving Theorem 12.31 we need to estabhsh several lemmas. The following lemma 
states that Ui is sub-exponentially small with respect to i. 

Lemma 3.12. There exist C>0,0<9<1 and < 6 < /3 such that for all i, 

|z/,l < ce'' 

Proof. Recall that for some < p < 1, 

0-6 = o(p^'/2). 

We then have 



00 



00 00 



k=0 k=0 

00 



k=0 k=0 

Noting that E[C,i+k] = and the constants in the above 0-terms are independent of k, we 
conclude that Ui is sub-exponentially small with respect to i. □ 

By the classical Skorokhod representation theorem (see [3), there exist non- negative 
random variables T, such that for all feasible M, 



i<M i<M 

and 



E[Ti\Ci.i] = E[^t\C,^i] a.s., E[T[] = 0{E[\^,r]) for each p > 1. 

Let Mtv denote the index of the ^-block or the ?7-block containing X^. Then, depending on 
Xn is contained in a (^-block or a //-block, we have either 

Mjv-l Mjv AIn Mm 

1=1 i=l 1=1 i=l 

or 

Afjv-l Mjv-1 Mjv-1 Mjv 

1=1 i=l 1=1 i=l 
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r>n+l 

x'^dx < r + 2" + ■ ■ ■ + < / x"c/x, 



Using the fact that 



we deduce that 

— < r + + ■ ■ ■ + < ^-^^-^ -. 

a + 1 a + 1 

We then have either 

{Mm - 1)"+^ M^+^ ^ ^ (M^ + 1)"+^ - 1 {Mn + lY+^ - 1 
a + 1 (5 + 1- - a + l (3 + 1 

or 

(M^ - 1)"+^ ^ (M^ - 1)/^+^ ^ ^^^^ ^ M^+^ - 1 ^ (Mjv + - 1 
a + l /3 + 1 ~~a + l /3 + 1 

Apparently, we have, for either of the above cases. 

As elaborated in [33], a somewhat standard procedure can be followed to establish an almost 
sure invariance principle. For Theorem 12.31 in this paper, it suffices to prove that 

1. for any e > 0, 

Mm 

Y^r^, = 0{N^I^+') a.s.; (19) 



1=1 



2. for any e > 0, 



= a^N + 0{N^'''^') a.s., (20) 

as tends to infinity. 

We will establish ( IT9l) in LemmalSHH To establish (l20ll . consider the following decomposition 

j=l i=l j=l i=l 

It is then clear that we only need to prove all the above three terms are of 0{N'^/^^^)^ for 
any e > 0. 

We need the following well-known lemma, whose proof can be found in [33] . 

Lemma 3.13. Let {xj} he a sequence of centered random variables with finite second mo- 
ments. Suppose that there exists a constant s > such that all integers k > j, 

E[{j2x.r] = o{k^-r). 
i=j 

Then for each S > 0, we have 

N 

J2xj = 0{N'/Hog^+^N) a.s. 

i=l 
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The following lemma establishes (fT9|) . 
Lemma 3.14. With probability 1, 



i=l 



for any e > 0. 

Proof. Note that for any j, k, 

k k 

i=j i=j i<j 

First, notice that an argument parallel to the proof for Part 2 of Lemma [3TT] with Conditions 
(a) and (c) implies that Elrjifjj] sub-exponentially small in j — z, and thus 

i<j 

Applying Lemma [3.31 we have for some small eo> 0, 



E[(J2 Vtf] = $^(^'^^ + 0(2^°)) + 0(1) = 0{k^+' - /+^) + 0(Fo+i _ 



V" ' -rv^V' J J ~r '-^ K-^ J — '-^ K'^ ~J J^^y ~ 

It then follows from Lemma [3.131 that for any >/?,£:[)> 

2=1 

where we have applied the fact that Mjy = 0(A^^/*^""'"^^). Choosing /3, ^o, P' ^ > ^ sufficiently 
small and setting a = 1/2, the lemma then immediately follows. □ 

The following three lemmas collectively establish ( 120|) . 

Lemma 3.15. With probability 1, 

1=1 

for any e > 0. 

Proof. Note that by Lemma [3.12[ Q and are sub-exponentially close. So, we only need to 
prove that 



J2C!-^'N = 0{N^/'^') a.s. 
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for any e > 0. 

Depending on whether X^v is contained in a C-block or a r^-block, we have either 



Mm Afjv Mm 



1=1 j=l i=l 

which imphes that 



or 



Wat-I A/at A^at 

^ -M%<N <^if^ -M%, 

i=\ i=\ i=l 



which imphes that 



In any case, applying Lemma [3 .Sj we have for some small > 0, 

M AT Af AT 

i=l j=l 

A/at Mjv 



i=l i=l 

where we have applied the fact that M^v = 9(A^^/'^"+^)). Choosing /3,£:o > small enough 
and setting a = 1/2, we then have 

Mm 

E[5^C-]-^'iv = o(iv2/3+-) 

1=1 

for any e > 0. 

So, to prove the lemma, it suffices to prove that with probability 1, 

i=l 

for any e > 0. Using Conditions (a) and (c), we derive that with probability 1, 

men - E[c!\^^-l]\ = men - E[c!\c,-^]\ + o(e[|6i]^v'^/') + o(^vo 

= OiEiCnX"'-') + 0{E[\Ci\]i''p'^^^) + 0(« V)- 
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Applying Lemma [3. 5[ we then have for any j, k, 

k k 



E[&' - = E[&! - E[C\C,-^m + 0(1) 

i=j 

k 

E^[(c'-^[cfiA-i]n + o(i) 



k 



k 



i=j 

= + (j + 1)'° + ■ ■ ■ + fc'") + 0(1) 
= 0(fc2"+i-/"+i), 

where we have used the fact that for ii ^ i2, 

EliCl - E[Cl\C,,^ml - E[Cl\C.,^i])] = 0. 
Applying Lemma [3.131 we then have, for any a' > a, 

Mm 



1=1 

Setting a = 1/2 and choosing a' slightly larger than 1/2, we then have proven the lemma. □ 
Lemma 3.16. With probability 1, 

Y.^E[i}\C,_,]-^)=0{N^''^^) 
1=1 

for any e > 0. 

Proof. Note that by Lemma [3.12[ Q and C,i are sub-exponentially close. So, we only need to 
prove that 

Mn 

J2iE[C!\C,-,]-Cf)=0{N'/'^n a.s. 

i=l 

for any e > 0. But this has been established in the proof of the previous lemma. □ 
Lemma 3.17. With probability 1, 



Mn 

J2m-E[T,\C,.,])=0iN'/'' 



i=l 

for any e > 0. 
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Proof. Similarly as in the proof of Lemma [3.15[ we have that for any j, k, 



k 

i=j 

k 

i=j 

< 0{f" + (j + 1)'" + ■ ■ ■ + 
Then, similarly as in the proof of Lemma 13.161 we deduce that 

A/jv 

J2m-E[T,\C,_^]) = 0{N'/'~'n a.s. 

i=l 

for any e > 0. □ 
3.5 Proof of Theorem [231 

In this proof, we assume the Bernstein blocking as in Theorem 12. 2[ Notice that 

P{Sn/n >e) = P{Sn > ne) = P{S'^ + Sn-S'^> ne) = P{S', >ne- {Sn - S'^)). 
Notice that Sn — S'^ = 0{n^^°^~^), so we have 

P{SJn >e)< PiS'^ > ne') = P{tS'Jp > tne'/p) < -j^^, (21) 
for some < e' < e. Applying Condition (a), we then have 

E[e'^'Jp] = E[e*^ti' 6/Pe*a/P] = (i + 0(A''(")/2))E[e*^ti' <^/p]E[e'^^]. (22) 
An iterative application of f l22|) gives us that for any < t < 1 

= (1 + 0(A^(")/'))'^-'(E[e*^l/P])^ (23) 



as n goes to infinity. If Condition (d) holds, by Lemma 13. 3[ as n goes to infinity (and hence 
p, q go to infinity) , we have 

E[C'i]/p' = o{l), 

which trivially holds when Condition (d) fails. It then follows that for any < t < 1, 

E[e'^'/P] = 1 + 0(1)^2, 
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and furthermore, for t > sufficiently small, we have 



< 1. (24) 



ete' l + te' + 0(1)^2 
Now, from (!2T|) . (!23l) and (!24l) . we deduce that for any e > 0, there exists e' > such that 

< (1 + 0(A'?(")/2))'=(E[e*^i/P]/e*"Y. 

Notice that for sufficiently large n, we have 

(1 + 0(A«(")/'))E[e*^^/^']/e*"' < 1, 

which, together with a > chosen sufficiently small, we conclude that for any x, e > 0, there 
exists < 7 < 1 such that 

P(5„/n>x)=0(7""^). 

The proof is then complete. 

3.6 Alternatives for Condition (d) 

Note that for the case Z is in fact a Markov chain, a rather explicit alternative condition for 
Condition (d) has been derived in [32]. This section only assumes Conditions (a), (b), (c) 
and gives alternatives for Condition (d) provided Conditions (a), (b), (c) are satisfied. 

Let J-", P) be the probability space on which Z is defined, and let Hq = H^Z^, k G Z) 
be the subspace of C^{J-') spanned by the equivalence classes of the random variables Z^, 
G Z, with inner product defined as 

<V,W >= E[VW], 

for any V,W e Hq. 

Lemma 3.18. // liminf„_j.oo -^[5*^] < oo, then there exists a sequence of random variables 

{Vi,i G N) such that Xi = Vi — V^+i with E[V^] = 0(1) uniformly for all i, and thus 
sup„E[^2] < 

Proof. Let Q be an infinite subset of N such that sup„gQ -E[S'^] < oo. Applying Condition 
(c), we have for any n, m, 

n+m n+m 

E[{S^^m - Sr^f] = E[{ g{Z\)f] = E[{ ^ {g{Zl^,) + 0(p— i)))^] 

i=m+l i=m+l 

= EK] + 0{E[\S„,\]) + 0(1) = E[Sl] + 0{E[S'S/') + 0(1). 
We then deduce that there exists O > such that for all i G N, 

sup E[{Sn+i^l - Si-if] < C, 
n&Q 
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where 5*0 is interpreted as 0. It follows from the Banach-Alaoglu theorem (which states that 
every bounded and closed set in a Hilbert space is weakly compact; see Section 3.15 of [35] ) 
that for any i E N, there exists Vi, G Hq with -E[V^^] < C, and Qi, an infinite subset of Q 
such that for all W E Hq, 

lim < W, Sn+^-l - S^-i >=< W, Vi >; 

n^oo,n(iQi 

here, without loss of generality, we can assume that Qi+i C Qi for all i. Then one verifies 
that for any W G Hq, we have that for any 

W,Xi-V,+V,+^>= lim <W,X,-{Sn+i_^-S,^r) + {S„,+,-Si)>= lim <W,X„+, >= 

where we have applied Lemma [3m for the last equality. Choosing W = Xi — Vi + l^+i, we 
then obtain that 

\\x,-Vi + v+42 = Q, 

which implies that 

Xi = Vi- Vi+i, a.s. 

It then follows that 

E[Sl] = E[{V^ - = E[V^] + E[V;'+i] - 2E[V,Vn+il 

which, together with i5[V^j^] < C, implies the theorem. 

□ 

A sequence of positive numbers, {h{i),i G N), is said to be slowly varying if for every 
positive integer m, 

lim h{mn)/h{n) = 1, 

n—^oo 

and it is said to be slowly varying in the strong sense if 

mmra<n<2mhin) 

Lemma 3.19. //lim„_>.oo -^[5*^] = oo, then E[S^] = nh{n), where {h{i),i eN) is a sequence 
of slowly varying positive numbers. 

Proof. We only need to show that for every positive integer /, 

lim afjal = I. 



Following [26], we use the Bernstein blocking method in the following way: We consecutively 
partition the partial sum Sin into blocks Ci? C25 ^72, • • • such that each d is of length n and 
each rji is of length r = [logcr^J. In other words, for any feasible i, 

n r 

s=l s=l 
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Now, 

It follows from Lemma 13.31 that for any j , 

E[C]] = al + Oia^). (25) 



Using an argument similar to the proof for Part 2 of Lemma 13.11 one has that there exists 
< 6^ < 1 such that for i ^ j, 

where we also used (125|1 . Using the Schwartz inequality and (125|) . we also have 
\E[Qvj]\ < E[(fY/'E[r^^Y/' = 0(a„a,) = 0(a„loga„), 

and 

|i?Mi]|<0(a,2) = 0((loga„f). 
It then follows that for any positive integer /, 

2 7 2 I / 2\ 

which immediately implies the lemma. □ 

Lemma 3.20. //lim„_>oo -^[5"^] = oo, then E[S^] = nh{n), where {h{i),i G N) zs a sequence 
of slowly varying positive numbers in the strong sense. 

Proof. Note that by Lemma [3. 3 j we have that for any j, 

uniformly in j . The lemma then follows from (|26|) , Lemma 13.191 and an almost the same 
proof for Theorem 8.13 of [TU]. □ 



The following lemma is well-known; see, e.g.. Proposition 0.16 in |10] . 

Lemma 3.21. Suppose {h{n),n e N) zs a sequence of positive numbers which is slowly 
varying in the strong sense. Then for every e > 0, one has that n'^h{n) — )■ oo as n oo. 

Lemma 3.22. // lim„^oo -^'[5'^] = oo, then a > 0. 

Proof. Assume, by contradiction, that a = 0. Since limn^oo E[S^] = oo, we deduce, by 
Lemma l3.20[ that E[S'^]/n is slowly varying in the strong sense. Then, by Lemma 13.211 
for any a > 0, n°'E[S'^]/n — oo as n — )■ oo. However, by Lemma 13. 3[ when a = 0, 
n°'E[S^]/n — )■ as n — )• oo for any < a < 1, which is a contradiction. □ 

The following theorem immediately follows from Lemma 13.181 and Lemma 13.221 which 
gives alternatives for Condition (d) given Conditions (a), (b) and (c) are satsified. 

Theorem 3.23. Under Conditions (a), (b) and (c), the following statements are equivalent 

1. (T > 0. 

2. lim^^oo = oo. 

3. limsup„^^E[52] = oo. 
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4 Proofs of the Main Results 



Unless specified otherwise, all the lemmas in this section only assume Condition (I). 

For each z E let denote the matrix such that ^z{h3) = ^{h j)p{z\j) for all feasible 
obviously ^^^^ ^2 = A. One also observes that for any z^^^, 

where vr is the stationary vector of F, 1 denotes the all one column vector and ^zZ'^ — 
A^^^ A^^^^j ■ ■ ■ A^^^. Since A is irreducible and aperiodic, A™^ is strictly positive if m2 — mi 
is large enough. Notice that 11 is strictly positive, by reblocking the process Z if necessary, 
we may assume that all A^ are positive. It then follows from the argument in p8] and the 
quotient rule (for taking the derivatives) that 

Lemma 4.1. For any I > and any compact subset Qq C Q, there exists C > such that 
for any z^_^ and any 9 (^VLq, 

\D'e^ogp{zo\zZl)\<C. 

For 5 > 0, let C^+[S\ denote the "relative" 5-neighborhood of M+ = {a; G M : a: > 0} 
within C, i.e., 

Cr+[5] = {2; G C : |z — a;| < 5x, for some x > 0}. 

Let C™(r) denote the r- neighborhood of 6 in C". It turns out that for r > small enough, 
p^{zo\zZn), H^(Zo\ZZn) cau be analytically continued to p^{zo\zZn), H^{Zo\ZZn) for all 9 G 
C™(r), respectively. With the fact that an n x n positive matrix induces a contraction 
mapping on the interior of the (n — l)-dimensional real simplex under the Hilbert metric [38j , 
the following lemma has been established in [18] (see also a more direct proof in [19] using 
a complex Hilbert metric). 

Lemma 4.2. 1. For any 6 > 0, there exists r > such that for any 9 G C^(r) and for 
any z\ G 

2. There exist C>0, 0<p<l andr > such that for any two hidden Markov sequences 
with z\ = z\ (hcrc m,m > n > 0) and all 9 G C^(r), we have 

\p%zo\zZl)-p%zo\zZ'J\<Cp\ 

and 

I \ogp%zo\zZl)-\ogp\z,\zZi)\ < Cp\ \Ee,A\ogp\Z,\Zzi)]-Ee,A\ogp'{Z,\Zzi)]\ < Cp\ 

Together with the Cauchy integral formula, the above lemma immediately implies the fol- 
lowing corollary. 
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Corollary 4.3. For any I > and any compact subset f2o C f2, there exist C>0,0<p<l 
such that for any two hidden Markov sequences -2°^, with z^^^ = (here nijih > n > 0) 
and any 9 G Qq, 

\Dyizo\zZ'J-Dyizo\zZl,)\<Cp'^, 

and 

\D^,logp%zo\zZl)-Dl\ogp%z,\zZ'J\ < Cp^, \EeM^ogp\Zo\Zzl)]-EeM^ogp%Z,\Zzm < C p\ 

It is well known [9j that a finite-state irreducible and aperiodic Markov chain is a ip- 
mixing sequence, and the corresponding ipin) exponentially decays as n — oo. The following 
lemma asserts that under Condition (I), Z is a t/^- mixing sequence and the corresponding 
ip{n) exponentially decays as n — )■ oo. An excellent survey on various mixing sequences can 
be found in [9j; for a comprehensive exposition to the vast literature on this subject, we refer 

to m. 



Lemma 4.4. Z^ is a ip-mixing sequence, and for any compact subset Qq C Q, there exist 
C > and < A < 1 such that for any positive n and any 6 & Qq, 

Proof. Note that for any positive n, m, I and any zj^, -zZ"_/, we have 

ETcA^rn 1 TtA -n 71 A -n 

vrA^-n 1 7rA,-n ttA^-. 1 

Let A2 denote the second largest (in modulus) eigenvalue of A. By the Perron- Frobenius 
theory (see, e.g., [38]), IA2I < 1; furthermore, for any A with IA2I < A < 1, there exists Ci > 
such that for any probability vector x, we have 

|xA" - 7r| < CiX'Z 

It then follows that 

P{^7Kl^i) = ^A.,rl + 0(A«)A,.a = p{z-^) + 0{\-)p{z^). 

Noting that the constant in 0(A") is independent of n, m, / and z™, zZ^-i^ we then conclude 
that for any U G i3(Zl^), V G B{Z^), 

p{y\u) = p{v) + o(A")P(\/), 

which immediately implies the lemma. □ 

In the following, we shall establish the main results by invoking the limit theorems in 
Section O Before doing so, we set 

f{Z,\Z{-') = D[\ogp{Z,\Z{-^), X, = f{Z.,\Z\-^) - Ee,[f{Z,\Z\-% (27) 

and 

n 

Sn = y^Xi, a'^ = Var{Sn), a = Wm. a'^/n. (28) 

i=l 

Then, by Corollary 14.31 and Lemma [4.41 Conditions (a), (b) and (c) are satisfied. 
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4.1 Proof of Theorem 11.11 

Note that for any i > j, applying Corollary 14.31 we have 

EeM^ogp%z,\z]-')] - E,jD^iog/(z,|z;:l)] = o{p'-n 



which implies that as j — )■ — oo, E g^lDg log p^{Zi\Zj ^)] converges to a limit, say Eg^^[Dg log p^{Zi\Z'^_^)] , 
such that 

EeM^ogp%Z,\Z;-')] - EeM^ogp\Z,\Zl-^)] = 0(p'-^). 
It then follows that 



EeM^ogp^Z-)] _ Z:=^EoM^ogp%Z,\Zl-')] 



n 



n 



Er=i (Eo, [D^e log P%Z, I Z!_-^)] + O (pO ) 



n 



which converges to E0^^[Dglogp^{Zo\Zzlo)] as n tends to infinity. This imphes the well- 
definedness of L^^\6). 

Now, with ( 1271) and ( l28l) . invoking Theorem 12.11 we have 



D\logp\Z-) EeM^ogp\Z^)] 

n n 

which, by the definition of L^''\9), implies the theorem. 



— )■ as n — )• oo, 



4.2 Proof of Theorem [1721 

We will need the following lemma, whose proof follows from Corollary 14.31 and Lemma 14.41 
and a completely parallel argument as in the proof of Lemma 13. 3[ and thus omitted. 

Lemma 4.5. Assume Conditions (I) and (II) and consider a compact subset Qq G Q and 
any I > 0. For any < < 1; there exists C > such that for any m, n and any 9 E flo, 



n 



Lemma 14.51 immediately implies that 



|aW(^)|=0(v^), 
and furthermore, for any small > 0, any m, n and any 6 G Qo 



(0 



Notice that by Corollary 14.31 

oo 

EeM'^ogpiZ^)] - nL^^\9) = OiJ^P') = 0(1), 
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(29) 



(30) 



(31) 



1=1 



and by Lemma [4.11 

|D^logp(Zn - EeM^ogp{Z^)]\ = 0{n). (32) 
Applying fl30|) . ( 131]) . fl32l) and fl29l) . we then have, for some Eq > sufficiently small, 



< 



D^logp(Zr) D^logp(Zi") -Eeo[^^logp(^r)] 



+ 



D^logp(Zi") - E,jZ}^logp(Zi")] D^logp(Zr) - E,jZ}^logp(Zr)] 



Eeo[/^^logp(Zr)]-nL«(^^) 



D;,logp(Zr)-i?eo[^^logp(^r 



(0 



Finally, with (127|) and (!28|) , invoking Theorem I2.2[ we have 

'D^ log ) -nLW(^) 



(33) 



< X 



D^logpiZ^) - EeM^ogpiZ^m] 



< X + 



D^logpiZ^) - EeM^ogpiZ^m] D^logpiZ^) - na^\9) 



G X 



, D^g\ogp{Z-) - EeM^ogp{Z-){9)] D', log piZ^) - na^\9) 



aii\9) 



y/Ea(^\9) 
+ 0(n-^/^+"o) 



It then follows from f l33|) that for any small Eq > 
'D^ logp(Zi") -nL«(^) 



V^(7«(e) 

We then have established the theorem. 



<x]= G(x) + 0(n-i/2+eo) + c'(n-i/^+^«) = G(x) + 0(n-i/^+^°). 



4.3 Proof of Theorem [1731 

With (1271) and ( l28l) . invoking Theorem 12.31 we can redefine the process {S'(t),t > 0} on a 
richer probability space together with the standard Brownian motion {B{t),t > 0} such that 
for any e > 0, 

^D^logp^(Zf) -J2E9o[De'^ogp%Z'^)] - B{{a^^\9))H) = 0{t^/^+') a.s. as t ^ oo. 

n<t n<t 

The theorem then follows from (l3Ti) . 
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4.4 Proof of Theorem 11.41 

With ( l27ll and ( l28l) . invoking Theorem 12.41 we have 



1- , D'e - Eg, [D^ log/(Zr)] 

T-S;^ (2n(a(O(^))21oglogn(a(O(0))2)i/2 



The theorem then follows from (l3TD. 



4.5 Proof of Theorem 11.51 

With ( l27l) and ( j28l) . invoking Theorem I2.5t we deduce that for any x,e > 0, there exist 
C>0, 0<7<1 such that 

D^,\ogp%Z^) - EgM^oSP%m > \ ^ 0(7"'-^). 
n ~ J 

The theorem then follows from flSTl) . 

4.6 Proof of Theorem 11.61 

Again, in this proof, we treat ^ as a one dimensional variable; without loss of generality, we 
further assume that L^'^\6) > for all 6 e Qq. 

By the mean value theorem, for any 6n, there exists a a convex combination of 6q and 
9n, such that 

D,log/"(Zr) = Dglogp'^'iZ^) + Djlogp'-{Z^){9r.-9o). 
And, by the definition of 9n, 

Dglogp'-{Z^) = 0. 

It then follows that for any x > 



P(\9n-9o\>x) = P 



Dg log/°(Zi" 



> j 



Dj log p'^"{Z^)/n 

It follows from negativity of the relative entropy p2] that for all 9 ^ Q and for all n, 

Eg,[l0gp%Z^)]<Eg,[l0gp''^{Z[% 

which implies that 

Ee,[De log p^'^iZ^)] = 0, and thus L^^\9o) = 0. 



Then, by Theorem 11.51 for any xi,ei > 0, there exist Ci > 0, < 71 < 1 such that for any 
n and any 9 G ^o, 

P{F{x,))<Caf"\ 
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where F{xi) denotes the event that 



n 



> X 



1- 



By Theorem 11.21 for any Xi,e2 > 0, there exists C2 > such that 

P{\en-0o\ > x,F%xi)) < P{\Ar\ > -xi)v^x) +^2^-1/4+^% 

where JV denotes the standard normal random variable. It then follows that 

P{\en~ eo\ >x) = PiiOn -eo\>x, F%xi)) + p{\e^ -eo\>x, f{x,)) 

< P{\Af\ > {L^^\9n) - x,)V^x) + C2n-i/^+^^ + P(F(xi)) 
where we have used the fact that for any y > 

Pi\^f\ >y)< e-y\ 

The theorem then immediately follows if we choose > sufficiently small such that for 
all 9 e Qo, 

L^^\e)-xi > 0. 
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